TAPE: Temporal Attention-Based Probabilistic Human Pose and Shape Estimation
نویسندگان
چکیده
Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in 2D to mapping computational complexity of video processing. Existing methods ignore reconstruction provide single deterministic estimate for pose. In order address these issues, we present Temporal Attention based Probabilistic Estimation method (TAPE) that operates on an RGB video. More specifically, propose use neural network encode frames temporal features using attention-based network. Given features, output per-frame temporally-informed probability distribution Normalizing Flows. We show TAPE outperforms state-of-the-art standard benchmarks serves as effective video-based prior optimization-based estimation. Code available at: https: //github.com/nikosvasilik/TAPE
منابع مشابه
Probabilistic Mapping of Human Visual Attention from Head Pose Estimation
Effective interaction between a human and a robot requires the bidirectional perception and interpretation of actions and behavior. While actions can be identified as a directly observable activity, this might not be sufficient to deduce actions in a scene. For example, orienting our face toward a book might suggest the action toward “reading.” For a human observer, this deduction requires the ...
متن کاملProbabilistic Temporal Head Pose Estimation Using a Hierarchical Graphical Model
We present a hierarchical graphical model to probabilistically estimate head pose angles from real-world videos, that leverages the temporal pose information over video frames. The proposed model employs a number of complementary facial features, and performs feature level, probabilistic classifier level and temporal level fusion. Extensive experiments are performed to analyze the pose estimati...
متن کاملTowards Accurate Markerless Human Shape and Pose Estimation over Time
Existing markerless motion capture methods often assume known backgrounds, static cameras, and sequence specific motion priors, limiting their application scenarios. Here we present a fully automatic method that, given multi-view videos, estimates 3D human pose and body shape. We take the recently proposed SMPLify method [12] as the base method and extend it in several ways. First we fit a 3D h...
متن کاملTowards Accurate Markerless Human Shape and Pose Estimation over Time
We address the problem of accurately estimating human shape, pose, and motion from images and video without markers or special cameras. Existing methods often assume known backgrounds, static cameras, and sequence specific motion priors. Here we propose a method that is fully automatic and, given multi-view video, estimates 3D human motion and body shape. Our work is built upon the recent SMPLi...
متن کاملPose-conditioned Spatio-Temporal Attention for Human Action Recognition
We address human action recognition from multi-modal video data involving articulated pose and RGB frames and propose a two-stream approach. The pose stream is processed with a convolutional model taking as input a 3D tensor holding data from a sub-sequence. A specific joint ordering, which respects the topology of the human body, ensures that different convolutional layers correspond to meanin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-31438-4_28